{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "60af5256",
   "metadata": {},
   "source": [
    "# Merge List of Profiles\n",
    "\n",
    "This is an example of a new utils in the dataprofiler for distributed merging of profile objects. This assumes the user is providing a list of profile objects to the utils function for merging all the profiles together."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7eee37ff",
   "metadata": {},
   "source": [
    "## Imports\n",
    "\n",
    "Let's start by importing the necessary packages..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0d27009",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "import json\n",
    "\n",
    "import pandas as pd\n",
    "import tensorflow as tf\n",
    "\n",
    "try:\n",
    "    sys.path.insert(0, '..')\n",
    "    import dataprofiler as dp\n",
    "    from dataprofiler.profilers.profiler_utils import merge_profile_list\n",
    "except ImportError:\n",
    "    import dataprofiler as dp\n",
    "    from dataprofiler.profilers.profiler_utils import merge_profile_list\n",
    "\n",
    "# remove extra tf loggin\n",
    "tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4369e64",
   "metadata": {},
   "source": [
    "## Setup the Data and Profiler"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "410c3c4d",
   "metadata": {},
   "source": [
    "This section shows the basic example of the Data Profiler. \n",
    "\n",
    "1. Instantiate a Pandas dataframe with dummy data\n",
    "2. Pass the dataframe to the `Profiler` and instantiate two separate profilers in a list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d3567c82",
   "metadata": {},
   "outputs": [],
   "source": [
    "d = {'col1': [1, 2], 'col2': [3, 4]}\n",
    "df = pd.DataFrame(data=d)\n",
    "\n",
    "list_of_profiles = [dp.Profiler(df), dp.Profiler(df)]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "350502eb",
   "metadata": {},
   "source": [
    "Take a look at the list of profiles... "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b649db32",
   "metadata": {},
   "outputs": [],
   "source": [
    "list_of_profiles"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ed4fc12",
   "metadata": {},
   "source": [
    "## Run Merge on List of Profiles\n",
    "\n",
    "Now let's merge the list of profiles into a `single_profile`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a636047",
   "metadata": {},
   "outputs": [],
   "source": [
    "single_profile = merge_profile_list(list_of_profiles=list_of_profiles)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0aa88720",
   "metadata": {},
   "source": [
    "And check out the `.report` on the single profile:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34059c21",
   "metadata": {},
   "outputs": [],
   "source": [
    "single_profile.report()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dataprofiler",
   "language": "python",
   "name": "dataprofiler"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
